High-Dimensional Clustering with Sparse Gaussian Mixture Models
نویسنده
چکیده
We consider the problem of clustering high-dimensional data using Gaussian Mixture Models (GMMs) with unknown covariances. In this context, the ExpectationMaximization algorithm (EM), which is typically used to learn GMMs, fails to cluster the data accurately due to the large number of free parameters in the covariance matrices. We address this weakness by assuming that the mixture model consists of sparse gaussian distributions and leveraging this assumption in a novel algorithm for learning GMMs. Our approach incorporates the graphical lasso procedure for sparse covariance estimation into the EM algorithm for learning GMMs, and by encouraging sparsity, it avoids the problems faced by traditional GMMs. We guarantee convergence of our algorithm and show through experimentation that this procedure outperforms the traditional Expectation Maximization algorithm and other clustering algorithms in the high-dimensional clustering setting.
منابع مشابه
Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we im...
متن کاملEfficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures
We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for le...
متن کاملSparse oracle inequalities for variable selection via regularized quantization
We give oracle inequalities on procedures which combines quantization and variable selection via a weighted Lasso k-means type algorithm. The results are derived for a general family of weights, which can be tuned to size the influence of the variables in different ways. Moreover, these theoretical guarantees are proved to adapt the corresponding sparsity of the optimal codebooks, suggesting th...
متن کاملMinimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture...
متن کاملEfficient mixture model for clustering of sparse high dimensional binary data
In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: is especially designed for the pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010